from pipeline example described by Pierre-Luc Germain, course sta426 UZH
# set local path
local.path <- getwd()
setwd(local.path)
# use source() function to source the functions we want to execute
source("functions/functions_qc.R")
source("functions/functions_analysis.R")
We look at both the raw and unfiltered data from the cell ranger output, to see what yields the best results. For both data sets, we first ran them through our quality control function. Since the filtered data already has all the empty droplets removed, this step will be skipped for the filtered data, but not for the raw data. In the following section we will quickly go over what steps we took in our quality control function and how we made sure they were sensible.
The first step in our pipeline is to seperate the antibody-derived tag (ADT) data from the remaining mRNA counts and clean them up. Ideally we'd like to identify the cells which failed to capture the ADT's and remove them. This is normally done by removing cells with ADT counts less than or equal to half of the total number of tags. If we do that we loose a huge number of cells as shown in the summary output. We also created a plot showing the distribution of the number of detected ADT's and from there we observe that we have lots of cells with zero ADT counts and a smaller peak at around 50. From this plot we decided to change the threshold at which cells will be discarded to 25, to not throw away as many cells. But even with this change it seems like to many cells will be discarded because of low quality ADT data, which is why we decided to ignore this part of the data altogether.
In the next step, the empty droplets will be removed. As mentioned before this only happens for the raw data. For this we ran the funciton emptyDrops(), which tests if the counts of a cell are significantly different from the ambient mRNA pool. If this is not the case, then it is most likely an empty droplet. After looking at some summary statistics of the output of this function, we realized that there were non-significant cellbarcodes, which were bounded by the number of iterations. We therefore increased the number of iterations in emptyDrops() until this wasn't the case anymore (100'000). There's also a plot showing the histogram of the p-values given by emptyDrops(), which should ideally be approximatelly uniform especially at the beginning. This was luckily the case. It is interesting to note, that this part of the quality control pipeline is responsible for the most amount of discarded cells.
Next we removed the dead cells, e.g. the cells with high mitochondrial gene expression. The approach we used was the same as in class. Lastly we also removed cells, which have low library sizes and detection rates. As seen in class we used the perCellQCMetrics() and isOutlier() functions to do so. Before removing all these cells we looked at the plots showing the above metrics to have an idea what they look like. We also produced a plot showing the log-fold-change of the discarded vs. kept cells. This plot should help decide if an entire cell type is accidently discarded because of our quality control. If that would be the case, we would observe an enrichment of a certain cell type in the discarded group, which can be identified by an increased expression of the corresponding marker genes.
## data can be downloaded from : http://imlspenticton.uzh.ch/dump/files_for_levesque.tar
#TODO: can be deleted?
# patient1_HS.path <- file.path("data", "patient1_HS")
#
# # read in filtered data
# fnameHS <- file.path(patient1_HS.path, "outs/filtered_feature_bc_matrix")
#pick your poison
#sces <- read_previous_data()
sces <- read_and_qc(4)
## [1] "###################################"
## [1] "Quality Control:"
## [1] "patient1_HS"
## [1] "Summary of discarded antibodies:"
## Mode FALSE TRUE
## logical 4387 22
## [1] "Distribution of the number of detected antibodies across all cells (red line = threshold, below which cells would be discarded):"
## [1] "Summary of the cells identified as dead cells:"
## Mode FALSE TRUE
## logical 3472 937
## [1] "Plot of cells, which will be discarded because they are identified as dead cells:"
## [1] "Summary of all cells that will be discarded and for what reasons:"
## DataFrame with 1 row and 4 columns
## LibSize NExprs MitoProp Total
## <integer> <integer> <integer> <integer>
## 1 0 330 937 937
## [1] "Plots looking at some quality metrics calculated by perCellQCMetrics:"
## [1] "Check that we don't accidently get rid of small cell populations because of our cleaning of the data:"
## [1] "Summary of the doublets identified by the scDblFinder algorithm:"
##
## singlet doublet
## 3350 122
## [1] "Percentage of removed cells:"
## [1] 24.01905
## [1] "###################################"
## [1] "Quality Control:"
## [1] "patient1_SCC"
## [1] "Summary of discarded antibodies:"
## Mode FALSE TRUE
## logical 3065 8
## [1] "Distribution of the number of detected antibodies across all cells (red line = threshold, below which cells would be discarded):"
## [1] "Summary of the cells identified as dead cells:"
## Mode FALSE TRUE
## logical 2371 702
## [1] "Plot of cells, which will be discarded because they are identified as dead cells:"
## [1] "Summary of all cells that will be discarded and for what reasons:"
## DataFrame with 1 row and 4 columns
## LibSize NExprs MitoProp Total
## <integer> <integer> <integer> <integer>
## 1 0 410 702 702
## [1] "Plots looking at some quality metrics calculated by perCellQCMetrics:"
## [1] "Check that we don't accidently get rid of small cell populations because of our cleaning of the data:"
## [1] "Summary of the doublets identified by the scDblFinder algorithm:"
##
## singlet doublet
## 2316 55
## [1] "Percentage of removed cells:"
## [1] 24.63391
## [1] "###################################"
## [1] "Quality Control:"
## [1] "patient2_HS"
## [1] "Summary of discarded antibodies:"
## Mode FALSE TRUE
## logical 8206 38
## [1] "Distribution of the number of detected antibodies across all cells (red line = threshold, below which cells would be discarded):"
## [1] "Summary of the cells identified as dead cells:"
## Mode FALSE TRUE
## logical 6837 1407
## [1] "Plot of cells, which will be discarded because they are identified as dead cells:"
## [1] "Summary of all cells that will be discarded and for what reasons:"
## DataFrame with 1 row and 4 columns
## LibSize NExprs MitoProp Total
## <integer> <integer> <integer> <integer>
## 1 0 401 1407 1407
## [1] "Plots looking at some quality metrics calculated by perCellQCMetrics:"
## [1] "Check that we don't accidently get rid of small cell populations because of our cleaning of the data:"
## [1] "Summary of the doublets identified by the scDblFinder algorithm:"
##
## singlet doublet
## 6360 477
## [1] "Percentage of removed cells:"
## [1] 22.85298
## [1] "###################################"
## [1] "Quality Control:"
## [1] "patient2_AK"
## [1] "Summary of discarded antibodies:"
## Mode FALSE TRUE
## logical 5028 8
## [1] "Distribution of the number of detected antibodies across all cells (red line = threshold, below which cells would be discarded):"
## [1] "Summary of the cells identified as dead cells:"
## Mode FALSE TRUE
## logical 4089 947
## [1] "Plot of cells, which will be discarded because they are identified as dead cells:"
## [1] "Summary of all cells that will be discarded and for what reasons:"
## DataFrame with 1 row and 4 columns
## LibSize NExprs MitoProp Total
## <integer> <integer> <integer> <integer>
## 1 0 319 947 950
## [1] "Plots looking at some quality metrics calculated by perCellQCMetrics:"
## [1] "Check that we don't accidently get rid of small cell populations because of our cleaning of the data:"
## [1] "Summary of the doublets identified by the scDblFinder algorithm:"
##
## singlet doublet
## 3907 179
## [1] "Percentage of removed cells:"
## [1] 22.41859
#sces <- read_10x()
sces <- lapply(sces, split_data)
# only select the gene expression data, drop the antibody capture
sce.patient1_HS <- sces[[1]]
sce.patient1_SCC <- sces[[2]]
sce.patient2_HS <- sces[[3]]
sce.patient2_AK <- sces[[4]]
# add name to sce for output graph
attr(sce.patient1_HS, "name") <- "Patient1 HS"
attr(sce.patient1_SCC, "name") <- "Patient1 SCC"
attr(sce.patient2_HS, "name") <- "Patient2 HS"
attr(sce.patient2_AK, "name") <- "Patient2 AK"
#convert rownames
# rownames(sce.patient1_HS) <- convert_rownames(sce.patient1_HS)
# rownames(sce.patient1_SCC) <- convert_rownames(sce.patient1_SCC)
# rownames(sce.patient2_HS) <- convert_rownames(sce.patient2_HS)
# rownames(sce.patient2_AK) <- convert_rownames(sce.patient2_AK)
#make sure every rowname has the format ENS_ID.Gene_Name
rownames(sce.patient1_HS) <- paste_rownames(sce.patient1_HS)
rownames(sce.patient1_SCC) <- paste_rownames(sce.patient1_SCC)
rownames(sce.patient2_HS) <- paste_rownames(sce.patient2_HS)
rownames(sce.patient2_AK) <- paste_rownames(sce.patient2_AK)
The markers found in the literature D. Kim, Chung, and Kim (2020) Solé-Boldo (2020) were used to separate the different cell types and their subtypes within them. A few markers didn't appear at all in our data (WISP2 [Secretory-reticular fibroblast], SEPP1 [Macrophage], TYP1 [Melanocyte]) so the corresponding cell types were selected using their remaining markers present. The Pericyte and vSMC markers were removed because they clustered badly and polluted the Fibroblast population. Since those markers are known to correlate with specific cell types, their genes were added to the list of high variable genes used to produce a lower dimensional representation of the data, even when their expression didn't vary enough to be selected as highly variable.
| Types | Subtypes | Markers |
|---|---|---|
| Keratinocyte | DSC3 DSP LGALS7 | |
| basal | KRT5 KRT14 COL17A1 | |
| suprabasal | KRT1 KRT10 | |
| differentiated | LOR SPINK5 | |
| ORS | KRT6B | |
| channel | GJB2 GJB6 ATP1B1 | |
| sebaceous gland | MGST1 FASN | |
| sweat gland | DCD AQP5 | |
| Fibroblast | DCN LUM MMP2 | |
| secretory reticular | SLPI CTHRC1 MFAP5 TSPAN8 | |
| proinflammatory | CCL19 APOE CXCL2 CXCL3 EFEMP1 | |
| secretory papillary | APCDD1 ID1 WIF1 COL18A1 PTGDS | |
| mesenchymal | ASPN POSTN GPC3 TNN SFRP1 | |
| Pericyte & vSMC | ACTA2 TAGLN | |
| pericyte | RGS5 | |
| vSMC | MYL9 TPM2 RERGL | |
| Endothelial cell | PECAM1 SELE CLDN5 VWF | |
| lymphatic | PROX1 LYVE1 | |
| Myeloid | HLA-DRA FCER1G TYROBP AIF1 | |
| dendritic | CD1C | |
| langerhans | CD207 | |
| macrophage | CD68 RNASE1 ITGAX | |
| Lymphocyte | CD3D CD3E CD52 IL7R | |
| Melanocyte | DCT MLANA PMEL | |
| Schwann cell | CDH19 NGFR | |
| Mitotic cell | UBE2C PCNA |
# run PCA
# using known genes
# genes not found in the data : WISP2" "SEPP1" "TYP1"
# genes linked to pericytevSMC : "ACTA2","TAGLN"
# were removed from the following list
known_markers <- c("DSC3","DSP","LGALS7","KRT5","KRT14","COL17A1","KRT1","KRT10","LOR", "SPINK5","KRT6B","GJB2","GJB6","ATP1B1","MGST1","FASN","DCD","AQP5","DCN","LUM","MMP2","SLPI","CTHRC1","MFAP5","TSPAN8","CCL19","APOE","CXCL2","CXCL3","EFEMP1","APCDD1","ID1","WIF1","COL18A1","PTGDS","ASPN","POSTN","GPC3","TNN","SFRP1","RGS5","MYL9","TPM2","RERGL","PECAM1","SELE","CLDN5","VWF","PROX1","LYVE1","HLA-DRA","FCER1G","TYROBP","AIF1","CD1C","CD207","CD68","RNASE1","ITGAX","CD3D","CD3E","CD52","IL7R","DCT","MLANA","PMEL","CDH19","NGFR","UBE2C","PCNA")
sce.patient1_HS <- sc_PCA(sce.patient1_HS, hvg.patient1_HS, known_markers)
## Adding IL7R to the gene set used for PCA in Patient1 HS
sce.patient1_SCC <- sc_PCA(sce.patient1_SCC, hvg.patient1_SCC, known_markers)
## Adding KRT10 APOE TNN AIF1 CD3E NGFR to the gene set used for PCA in Patient1 SCC
sce.patient2_HS <- sc_PCA(sce.patient2_HS, hvg.patient2_HS, known_markers)
## Adding COL17A1 FCER1G AIF1 CD207 ITGAX CD3E to the gene set used for PCA in Patient2 HS
sce.patient2_AK <- sc_PCA(sce.patient2_AK, hvg.patient2_AK, known_markers)
## Adding GPC3 TNN FCER1G ITGAX CD3E to the gene set used for PCA in Patient2 AK
results <- sc_cluster(sce.patient1_HS)
sce.patient1_HS <- results[[1]]
g.patient1_HS <- results[[2]]
results <- sc_cluster(sce.patient1_SCC)
sce.patient1_SCC <- results[[1]]
g.patient1_SCC <- results[[2]]
results <- sc_cluster(sce.patient2_HS)
sce.patient2_HS <- results[[1]]
g.patient2_HS <- results[[2]]
results <- sc_cluster(sce.patient2_AK)
sce.patient2_AK <- results[[1]]
g.patient2_AK <- results[[2]]
# genes not found in our data : "DCDN" "WISP2" "SEPP1" "TYP1"
# were removed from the list below
genes <- list(
# classification: https://www.sciencedirect.com/science/article/pii/S0923181120301985?via%3Dihub#bib0050
# KERATINOCYTE
keratinocyte = c("DSC3","DSP","LGALS7"),
#keratinocyte_basal = c("KRT5","KRT14","COL17A1"),
#keratinocyte_suprabasal = c("KRT1","KRT10"),
#keratinocyte_differentiated = c("LOR", "SPINK5"),
#keratinocyte_ORS = c("KRT6B"),
#keratinocyte_channel = c("GJB2","GJB6","ATP1B1"),
#keratinocyte_sebaceous_gland = c("MGST1","FASN"),
#keratinocyte_sweat_gland = c("DCD","AQP5"),
# FIBROBLAST
fibroblast = c("DCN","LUM","MMP2"),
# fibroblast subclassification from: https://www.nature.com/articles/s42003-020-0922-4#
#fibroblast_secretory_reticular = c(SLPI","CTHRC1","MFAP5","TSPAN8"),
#fibroblast_proinflammatory = c("CCL19","APOE","CXCL2","CXCL3","EFEMP1"),
#fibroblast_secretory_papillary = c("APCDD1","ID1","WIF1","COL18A1","PTGDS"),
#fibroblast_mesenchymal = c("ASPN","POSTN","GPC3","TNN","SFRP1"),
# PERICYTE & vSMC
pericytevSMC = c("ACTA2","TAGLN"), #vSMC : vascular smooth muscel cell
#pericytevSMC_pericyte = c("RGS5"),
#pericytevSMC_vSMC = c("MYL9","TPM2","RERGL"),
# ENDOTHELIAL CELL
endothelial = c("PECAM1","SELE","CLDN5","VWF"),
#endothelial_lymphatic = c("PROX1","LYVE1"),
# MYELOID CELL
myeloid = c("HLA-DRA","FCER1G","TYROBP","AIF1"),
#myeloid_dendritic = c("CD1C"),
#myeloid_langerhans = c("CD207"),
#myeloid_macrophage = c("CD68","RNASE1","ITGAX"),
# LYMPHOCYTE
lymphocyte = c("CD3D","CD3E","CD52","IL7R"),
# MELANOCYTE
melanocyte = c("DCT","MLANA","PMEL"),
# SCHWANN CELL
schwann = c("CDH19","NGFR"),
# MITOTIC CELL
mitotic = c("UBE2C","PCNA")
)
# TODO: try to move the convert_rownames step higher in a block without output so we're less
# likely to run it multiple time (which would lead to weird output)
#rownames(sce.patient1_HS) <- convert_rownames(sce.patient1_HS, g.patient1_HS, genes)
results <- annotate_cells(sce.patient1_HS, g.patient1_HS, genes)
sce.patient1_HS <- results[[1]]
km.patient1_HS <- results[[2]]
#rownames(sce.patient1_SCC) <- convert_rownames(sce.patient1_SCC, g.patient1_SCC, genes)
results <- annotate_cells(sce.patient1_SCC, g.patient1_SCC, genes)
sce.patient1_SCC <- results[[1]]
km.patient1_SCC <- results[[2]]
#rownames(sce.patient2_HS) <- convert_rownames(sce.patient2_HS, g.patient2_HS, genes)
results <- annotate_cells(sce.patient2_HS, g.patient2_HS, genes)
sce.patient2_HS <- results[[1]]
km.patient2_HS <- results[[2]]
#rownames(sce.patient2_AK) <- convert_rownames(sce.patient2_AK, g.patient2_AK, genes)
results <- annotate_cells(sce.patient2_AK, g.patient2_AK, genes)
sce.patient2_AK <- results[[1]]
km.patient2_AK <- results[[2]]
#Patient 1 HS
#km.patient1_HS
#km.patient1_SCC
#rowData(sce.patient1_HS[c("keratinocyte", "fibroblast", "endothelial", "myeloid", )])$Symbol
results <- pseudobulk(sce.patient1_HS, km.patient1_HS)
sce.patient1_HS <- results[[1]]
pb.patient1_HS <- results[[2]]
mat.patient1_HS <- results[[3]]
cl2.patient1_HS <- results[[4]]
#Patient 1 SCC
results <- pseudobulk(sce.patient1_SCC, km.patient1_SCC)
sce.patient1_SCC <- results[[1]]
pb.patient1_SCC <- results[[2]]
mat.patient1_SCC <- results[[3]]
cl2.patient1_SCC <- results[[4]]
#Patient 2 HS
results <- pseudobulk(sce.patient2_HS, km.patient2_HS)
sce.patient2_HS <- results[[1]]
pb.patient2_HS <- results[[2]]
mat.patient2_HS <- results[[3]]
cl2.patient2_HS <- results[[4]]
#Patient 2 AK
results <- pseudobulk(sce.patient2_AK, km.patient2_AK)
sce.patient2_AK <- results[[1]]
pb.patient2_AK <- results[[2]]
mat.patient2_AK <- results[[3]]
cl2.patient2_AK <- results[[4]]
# create barplots of cell type proportions
df.celltypes.patient1 <- rbind(df_barplot_celltypes(sce.patient1_HS), df_barplot_celltypes(sce.patient1_SCC))
df.celltypes.patient1$Type <- c("HS", "HS", "HS", "HS", "HS", "HS", "HS", "HS", "SCC", "SCC", "SCC", "SCC", "SCC", "SCC", "SCC", "SCC")
df.celltypes.patient2 <- rbind(df_barplot_celltypes(sce.patient2_HS), df_barplot_celltypes(sce.patient2_AK))
df.celltypes.patient2$Type <- c("HS", "HS", "HS", "HS", "HS", "HS", "HS", "HS", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK")
dynamic_barplot(df.celltypes.patient1, "Patient 1", T, "Proportion of cell types in", c(HS ='#999999', SCC = '#E69F00'))
dynamic_barplot(df.celltypes.patient2, "Patient 2", T, "Proportion of cell types in", c(HS ='#999999', AK = '#E69F00'))
##are those results significant or could they have happened by chance?
fisher_test_subtypes(sce.patient1_HS, sce.patient1_SCC, 'keratinocyte')
##
## Fisher's Exact Test for Count Data
##
## data: subtype_matrix
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 1.882766 2.415495
## sample estimates:
## odds ratio
## 2.131753
fisher_test_subtypes(sce.patient1_HS, sce.patient1_SCC, 'lymphocyte')
##
## Fisher's Exact Test for Count Data
##
## data: subtype_matrix
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.2368988 0.3284277
## sample estimates:
## odds ratio
## 0.2791748
fisher_test_subtypes(sce.patient1_HS, sce.patient1_SCC, 'fibroblast')
##
## Fisher's Exact Test for Count Data
##
## data: subtype_matrix
## p-value = 2.8e-08
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.3948515 0.6476806
## sample estimates:
## odds ratio
## 0.5061342
known_markers.kera <- c("KRT5","KRT14","COL17A1","KRT1","KRT10","LOR", "SPINK5","KRT6B","GJB2","GJB6","ATP1B1","MGST1","FASN","DCD","AQP5")
#annotate the subclusters
genes.kera <- list(
# classification: https://www.sciencedirect.com/science/article/pii/S0923181120301985?via%3Dihub#bib0050
# KERATINOCYTE
#keratinocyte = c("DSC3","DSP","LGALS7"),
basal = c("KRT5","KRT14","COL17A1"),
suprabasal = c("KRT1","KRT10"),
differentiated = c("LOR", "SPINK5"),
ORS = c("KRT6B"),
channel = c("GJB2","GJB6","ATP1B1"),
sebaceous_gland = c("MGST1","FASN"),
sweat_gland = c("DCD","AQP5")
)
# TODO : find a way to avoid giving both known_merkers.kera and genes.kera since both contain the
# same information in essence. Maybe will be fixed by the first todo
# yeah but then we are running an additional for loop/maybe we want different ones (hypothetically at least)
sce.patient1_HS.kera <- cluster_subtype(sce.patient1_HS, "keratinocyte", known_markers.kera, genes.kera)
##
|
| | 0%
|
|================== | 25%
|
|=================================== | 50%
|
|==================================================== | 75%
|
|======================================================================| 100%
##
|
| | 0%
|
|== | 3%
|
|==== | 6%
|
|====== | 9%
|
|======== | 12%
|
|========== | 15%
|
|============ | 18%
|
|============== | 21%
|
|================ | 24%
|
|=================== | 26%
|
|===================== | 29%
|
|======================= | 32%
|
|========================= | 35%
|
|=========================== | 38%
|
|============================= | 41%
|
|=============================== | 44%
|
|================================= | 47%
|
|=================================== | 50%
|
|===================================== | 53%
|
|======================================= | 56%
|
|========================================= | 59%
|
|=========================================== | 62%
|
|============================================= | 65%
|
|=============================================== | 68%
|
|================================================= | 71%
|
|=================================================== | 74%
|
|====================================================== | 76%
|
|======================================================== | 79%
|
|========================================================== | 82%
|
|============================================================ | 85%
|
|============================================================== | 88%
|
|================================================================ | 91%
|
|================================================================== | 94%
|
|==================================================================== | 97%
|
|======================================================================| 100%
## Adding AQP5 to the gene set used for PCA in Patient1 HS
sce.patient1_SCC.kera <- cluster_subtype(sce.patient1_SCC, "keratinocyte", known_markers.kera, genes.kera)
##
|
| | 0%
|
|================== | 25%
|
|=================================== | 50%
|
|==================================================== | 75%
|
|======================================================================| 100%
##
|
| | 0%
|
|== | 3%
|
|==== | 6%
|
|======= | 9%
|
|========= | 12%
|
|=========== | 16%
|
|============= | 19%
|
|=============== | 22%
|
|================== | 25%
|
|==================== | 28%
|
|====================== | 31%
|
|======================== | 34%
|
|========================== | 38%
|
|============================ | 41%
|
|=============================== | 44%
|
|================================= | 47%
|
|=================================== | 50%
|
|===================================== | 53%
|
|======================================= | 56%
|
|========================================== | 59%
|
|============================================ | 62%
|
|============================================== | 66%
|
|================================================ | 69%
|
|================================================== | 72%
|
|==================================================== | 75%
|
|======================================================= | 78%
|
|========================================================= | 81%
|
|=========================================================== | 84%
|
|============================================================= | 88%
|
|=============================================================== | 91%
|
|================================================================== | 94%
|
|==================================================================== | 97%
|
|======================================================================| 100%
## Adding FASN to the gene set used for PCA in Patient1 SCC
sce.patient2_HS.kera <- cluster_subtype(sce.patient2_HS, "keratinocyte", known_markers.kera, genes.kera)
##
|
| | 0%
|
|================== | 25%
|
|=================================== | 50%
|
|==================================================== | 75%
|
|======================================================================| 100%
##
|
| | 0%
|
|== | 3%
|
|==== | 6%
|
|====== | 9%
|
|======== | 12%
|
|========== | 15%
|
|============ | 18%
|
|============== | 21%
|
|================ | 24%
|
|=================== | 26%
|
|===================== | 29%
|
|======================= | 32%
|
|========================= | 35%
|
|=========================== | 38%
|
|============================= | 41%
|
|=============================== | 44%
|
|================================= | 47%
|
|=================================== | 50%
|
|===================================== | 53%
|
|======================================= | 56%
|
|========================================= | 59%
|
|=========================================== | 62%
|
|============================================= | 65%
|
|=============================================== | 68%
|
|================================================= | 71%
|
|=================================================== | 74%
|
|====================================================== | 76%
|
|======================================================== | 79%
|
|========================================================== | 82%
|
|============================================================ | 85%
|
|============================================================== | 88%
|
|================================================================ | 91%
|
|================================================================== | 94%
|
|==================================================================== | 97%
|
|======================================================================| 100%
## Adding FASN to the gene set used for PCA in Patient2 HS
sce.patient2_AK.kera <- cluster_subtype(sce.patient2_AK, "keratinocyte", known_markers.kera, genes.kera)
##
|
| | 0%
|
|================== | 25%
|
|=================================== | 50%
|
|==================================================== | 75%
|
|======================================================================| 100%
##
|
| | 0%
|
|== | 3%
|
|==== | 6%
|
|======= | 9%
|
|========= | 12%
|
|=========== | 16%
|
|============= | 19%
|
|=============== | 22%
|
|================== | 25%
|
|==================== | 28%
|
|====================== | 31%
|
|======================== | 34%
|
|========================== | 38%
|
|============================ | 41%
|
|=============================== | 44%
|
|================================= | 47%
|
|=================================== | 50%
|
|===================================== | 53%
|
|======================================= | 56%
|
|========================================== | 59%
|
|============================================ | 62%
|
|============================================== | 66%
|
|================================================ | 69%
|
|================================================== | 72%
|
|==================================================== | 75%
|
|======================================================= | 78%
|
|========================================================= | 81%
|
|=========================================================== | 84%
|
|============================================================= | 88%
|
|=============================================================== | 91%
|
|================================================================== | 94%
|
|==================================================================== | 97%
|
|======================================================================| 100%
## Adding KRT5 to the gene set used for PCA in Patient2 AK
genes.dyn <- list(
# classification: https://www.sciencedirect.com/science/article/pii/S0092867420306723
keratinocyte_basal = "COL17A1",
keratinocyte_cycling = "MKI67",
keratinocyte_differentiating = "KRT1"
)
# dynamic_plot(sce.patient1_HS.kera, g.patient1_HS.kera, genes.dyn)
# dynamic_plot(sce.patient1_SCC.kera, g.patient1_SCC.kera, genes.dyn)
# dynamic_plot(sce.patient2_HS.kera, g.patient2_HS.kera, genes.dyn)
# dynamic_plot(sce.patient2_AK.kera, g.patient2_AK.kera, genes.dyn)
## lelo_new plot
kera.dyn.patient1_HS <- dynamic_plot(sce.patient1_HS.kera, g.patient1_HS.kera, genes.dyn)
## class: SingleCellExperiment
## dim: 16970 1262
## metadata(2): Samples scDblFinder.stats
## assays(2): counts logcounts
## rownames(16970): ENSG00000241860.AL627309.5 ENSG00000237491.LINC01409
## ... ENSG00000278817.AC007325.4 ENSG00000277196.AC007325.2
## rowData names(4): ID Symbol Type scDblFinder.selected
## colnames(1262): AAACCTGAGGGCTCTC-1 AAACCTGCATAGAAAC-1 ...
## TTTGTCACAATCGGTT-1 TTTGTCAGTTCACCTC-1
## colData names(15): Sample Barcode ... cluster2 cluster3
## reducedDimNames(3): PCA TSNE UMAP
## altExpNames(1): Antibody Capture
kera.dyn.patient1_SCC <- dynamic_plot(sce.patient1_SCC.kera, g.patient1_SCC.kera, genes.dyn)
## class: SingleCellExperiment
## dim: 15718 504
## metadata(2): Samples scDblFinder.stats
## assays(2): counts logcounts
## rownames(15718): ENSG00000237491.LINC01409 ENSG00000228794.LINC01128
## ... ENSG00000277666.AC136352.3 ENSG00000273554.AC136616.1
## rowData names(4): ID Symbol Type scDblFinder.selected
## colnames(504): AAACCTGGTAGATTAG-1 AAACGGGGTCCGTGAC-1 ...
## TTTGCGCGTCGGCATC-1 TTTGGTTAGTCATGCT-1
## colData names(15): Sample Barcode ... cluster2 cluster3
## reducedDimNames(3): PCA TSNE UMAP
## altExpNames(1): Antibody Capture
kera.dyn.patient2_HS <- dynamic_plot(sce.patient2_HS.kera, g.patient2_HS.kera, genes.dyn)
## class: SingleCellExperiment
## dim: 16987 2468
## metadata(2): Samples scDblFinder.stats
## assays(2): counts logcounts
## rownames(16987): ENSG00000237491.LINC01409 ENSG00000228794.LINC01128
## ... ENSG00000278817.AC007325.4 ENSG00000277196.AC007325.2
## rowData names(4): ID Symbol Type scDblFinder.selected
## colnames(2468): AAACCTGGTCAAACTC-1 AAACGGGAGGAGCGAG-1 ...
## TTTGTCATCAGTCAGT-1 TTTGTCATCTATCCCG-1
## colData names(15): Sample Barcode ... cluster2 cluster3
## reducedDimNames(3): PCA TSNE UMAP
## altExpNames(1): Antibody Capture
kera.dyn.patient2_AK <- dynamic_plot(sce.patient2_AK.kera, g.patient2_AK.kera, genes.dyn)
## class: SingleCellExperiment
## dim: 15646 445
## metadata(2): Samples scDblFinder.stats
## assays(2): counts logcounts
## rownames(15646): ENSG00000241860.AL627309.5 ENSG00000237491.LINC01409
## ... ENSG00000273554.AC136616.1 ENSG00000277196.AC007325.2
## rowData names(4): ID Symbol Type scDblFinder.selected
## colnames(445): AAAGTAGCACCGCTAG-1 AAATGCCTCGGAATCT-1 ...
## TTTGGTTCATGAACCT-1 TTTGGTTGTGGTAACG-1
## colData names(15): Sample Barcode ... cluster2 cluster3
## reducedDimNames(3): PCA TSNE UMAP
## altExpNames(1): Antibody Capture
kera.dyn.patient1 <- rbind(kera.dyn.patient1_HS, kera.dyn.patient1_SCC)
kera.dyn.patient1$Type <- c("HS", "HS", "HS", "SCC", "SCC", "SCC")
kera.dyn.patient2 <- rbind(kera.dyn.patient2_HS, kera.dyn.patient2_AK)
kera.dyn.patient2$Type <- c("HS", "HS", "HS", "AK", "AK", "AK")
dynamic_barplot(kera.dyn.patient1, "Patient 1", F, "Proportion of Keratinocyte states in", c(HS ='#999999', SCC = '#E69F00'))
dynamic_barplot(kera.dyn.patient2, "Patient 2", F, "Proportion of Keratinocyte states in", c(HS ='#999999', AK = '#E69F00'))
known_markers.fibro <- c("SLPI","CTHRC1","MFAP5","TSPAN8","CCL19","APOE","CXCL2","CXCL3","EFEMP1","APCDD1","ID1","WIF1","COL18A1","PTGDS","ASPN","POSTN","GPC3","TNN","SFRP1")
genes.fibro <- list(
#fibroblast = c("DCN","LUM","MMP2"),
# # fibroblast subclassification from: https://www.nature.com/articles/s42003-020-0922-4#
fibroblast_secretory_reticular = c("SLPI","CTHRC1","MFAP5","TSPAN8"),
fibroblast_proinflammatory = c("CCL19","APOE","CXCL2","CXCL3","EFEMP1"),
fibroblast_secretory_papillary = c("APCDD1","ID1","WIF1","COL18A1","PTGDS"),
fibroblast_mesenchymal = c("ASPN","POSTN","GPC3","TNN","SFRP1")
)
sce.patient1_HS.fibro <- cluster_subtype(sce.patient1_HS, "fibroblast", known_markers.fibro, genes.fibro)
##
|
| | 0%
|
|================== | 25%
|
|=================================== | 50%
|
|==================================================== | 75%
|
|======================================================================| 100%
##
|
| | 0%
|
|=== | 4%
|
|====== | 9%
|
|========= | 13%
|
|============ | 17%
|
|=============== | 22%
|
|================== | 26%
|
|===================== | 30%
|
|======================== | 35%
|
|=========================== | 39%
|
|============================== | 43%
|
|================================= | 48%
|
|===================================== | 52%
|
|======================================== | 57%
|
|=========================================== | 61%
|
|============================================== | 65%
|
|================================================= | 70%
|
|==================================================== | 74%
|
|======================================================= | 78%
|
|========================================================== | 83%
|
|============================================================= | 87%
|
|================================================================ | 91%
|
|=================================================================== | 96%
|
|======================================================================| 100%
## Adding WIF1 to the gene set used for PCA in Patient1 HS
sce.patient1_SCC.fibro <- cluster_subtype(sce.patient1_SCC, "fibroblast", known_markers.fibro, genes.fibro)
##
|
| | 0%
|
|================== | 25%
|
|=================================== | 50%
|
|==================================================== | 75%
|
|======================================================================| 100%
##
|
| | 0%
|
|=== | 4%
|
|===== | 7%
|
|======== | 11%
|
|========== | 15%
|
|============= | 19%
|
|================ | 22%
|
|================== | 26%
|
|===================== | 30%
|
|======================= | 33%
|
|========================== | 37%
|
|============================= | 41%
|
|=============================== | 44%
|
|================================== | 48%
|
|==================================== | 52%
|
|======================================= | 56%
|
|========================================= | 59%
|
|============================================ | 63%
|
|=============================================== | 67%
|
|================================================= | 70%
|
|==================================================== | 74%
|
|====================================================== | 78%
|
|========================================================= | 81%
|
|============================================================ | 85%
|
|============================================================== | 89%
|
|================================================================= | 93%
|
|=================================================================== | 96%
|
|======================================================================| 100%
## Adding CCL19 to the gene set used for PCA in Patient1 SCC
sce.patient2_HS.fibro <- cluster_subtype(sce.patient2_HS, "fibroblast", known_markers.fibro, genes.fibro)
##
|
| | 0%
|
|================== | 25%
|
|=================================== | 50%
|
|==================================================== | 75%
|
|======================================================================| 100%
##
|
| | 0%
|
|== | 4%
|
|===== | 7%
|
|======== | 11%
|
|========== | 14%
|
|============ | 18%
|
|=============== | 21%
|
|================== | 25%
|
|==================== | 29%
|
|====================== | 32%
|
|========================= | 36%
|
|============================ | 39%
|
|============================== | 43%
|
|================================ | 46%
|
|=================================== | 50%
|
|====================================== | 54%
|
|======================================== | 57%
|
|========================================== | 61%
|
|============================================= | 64%
|
|================================================ | 68%
|
|================================================== | 71%
|
|==================================================== | 75%
|
|======================================================= | 79%
|
|========================================================== | 82%
|
|============================================================ | 86%
|
|============================================================== | 89%
|
|================================================================= | 93%
|
|==================================================================== | 96%
|
|======================================================================| 100%
sce.patient2_AK.fibro <- cluster_subtype(sce.patient2_AK, "fibroblast", known_markers.fibro, genes.fibro)
##
|
| | 0%
|
|================== | 25%
|
|=================================== | 50%
|
|==================================================== | 75%
|
|======================================================================| 100%
##
|
| | 0%
|
|==== | 6%
|
|======== | 11%
|
|============ | 17%
|
|================ | 22%
|
|=================== | 28%
|
|======================= | 33%
|
|=========================== | 39%
|
|=============================== | 44%
|
|=================================== | 50%
|
|======================================= | 56%
|
|=========================================== | 61%
|
|=============================================== | 67%
|
|=================================================== | 72%
|
|====================================================== | 78%
|
|========================================================== | 83%
|
|============================================================== | 89%
|
|================================================================== | 94%
|
|======================================================================| 100%
# The following code was used to check which genes were missing from in the data
# the known_markers are from the litterature
data_markers <- rowData(sce.patient1_HS)$Symbol
known_markers <- c("DSC3","DSP","LGALS7","KRT5","KRT14","COL17A1","KRT1","KRT10","LOR", "SPINK5","KRT6B","GJB2","GJB6","ATP1B1","MGST1","FASN","DCD","AQP5","DCN","LUM","MMP2","WISP2","SLPI","CTHRC1","MFAP5","TSPAN8","CCL19","APOE","CXCL2","CXCL3","EFEMP1","APCDD1","ID1","WIF1","COL18A1","PTGDS","ASPN","POSTN","GPC3","TNN","SFRP1","ACTA2","TAGLN","RGS5","MYL9","TPM2","RERGL","PECAM1","SELE","CLDN5","VWF","PROX1","LYVE1","HLA-DRA","FCER1G","TYROBP","AIF1","CD1C","CD207","CD68","RNASE1","SEPP1","ITGAX","CD3D","CD3E","CD52","IL7R","DCT","MLANA","PMEL","TYP1","CDH19","NGFR","UBE2C","PCNA")
idx_notfound <- which(!(known_markers %in% rowData(sce.patient1_HS)$Symbol))
print(paste("Done:", length(known_markers)-length(idx_notfound), "/", length(known_markers), "markers found"))
cat("Genes missing in dataset: ", known_markers[idx_notfound])
Kim, Doyoung, Kyung Bae Chung, and Tae-Gyun Kim. 2020. “Application of Single-Cell Rna Sequencing on Human Skin: Technical Evolution and Challenges.” Journal of Dermatological Science 99 (2): 74–81. doi:https://doi.org/10.1016/j.jdermsci.2020.06.002.
Solé-Boldo, Raddatz, L. 2020. “Single-Cell Transcriptomes of the Human Skin Reveal Age-Related Loss of Fibroblast Priming.” Communications Biology 3 (188). doi:https://doi.org/10.1038/s42003-020-0922-4.